3 research outputs found
Extractive Summarization : Experimental work on nursing notes in Finnish
Natural Language Processing (NLP) is a subfield of artificial intelligence and linguistics that is
concerned with how a computer machine interacts with human language. With the increasing
computational power and the advancement in technologies, researchers have been successful at
proposing various NLP tasks that have already been implemented as real-world applications today.
Automated text summarization is one of the many tasks that has not yet completely matured
particularly in health sector. A success in this task would enable healthcare professionals to grasp
patient's history in a minimal time resulting in faster decisions required for better care.
Automatic text summarization is a process that helps shortening a large text without sacrificing
important information. This could be achieved by paraphrasing the content known as the abstractive
method or by concatenating relevant extracted sentences namely the extractive method. In general, this
process requires the conversion of text into numerical form and then a method is executed to identify
and extract relevant text.
This thesis is an attempt of exploring NLP techniques used in extractive text summarization
particularly in health domain. The work includes a comparison of basic summarizing models
implemented on a corpus of patient notes written by nurses in Finnish language. Concepts and
research studies required to understand the implementation have been documented along with the
description of the code.
A python-based project is structured to build a corpus and execute multiple summarizing models. For
this thesis, we observe the performance of two textual embeddings namely Term Frequency - Inverse
Document Frequency (TF-IDF) which is based on simple statistical measure and Word2Vec which is
based on neural networks. For both models, LexRank, an unsupervised stochastic graph-based
sentence scoring algorithm, is used for sentence extraction and a random selection method is used as a
baseline method for evaluation.
To evaluate and compare the performance of models, summaries of 15 patient care episodes of each
model were provided to two human beings for manual evaluations. According to the results of the
small sample dataset, we observe that both evaluators seem to agree with each other in preferring
summaries produced by Word2Vec LexRank over the summaries generated by TF-IDF LexRank.
Both models have also been observed, by both evaluators, to perform better than the baseline model of
random selection
Multi-ancestry genetic study of type 2 diabetes highlights the power of diverse populations for discovery and translation
We assembled an ancestrally diverse collection of genome-wide association studies (GWAS) of type 2 diabetes (T2D) in 180,834 affected individuals and 1,159,055 controls (48.9% non-European descent) through the Diabetes Meta-Analysis of Trans-Ethnic association studies (DIAMANTE) Consortium. Multi-ancestry GWAS meta-analysis identified 237 loci attaining stringent genome-wide significance (P < 5 × 10), which were delineated to 338 distinct association signals. Fine-mapping of these signals was enhanced by the increased sample size and expanded population diversity of the multi-ancestry meta-analysis, which localized 54.4% of T2D associations to a single variant with >50% posterior probability. This improved fine-mapping enabled systematic assessment of candidate causal genes and molecular mechanisms through which T2D associations are mediated, laying the foundations for functional investigations. Multi-ancestry genetic risk scores enhanced transferability of T2D prediction across diverse populations. Our study provides a step toward more effective clinical translation of T2D GWAS to improve global health for all, irrespective of genetic background